Publications

Publications (updated 17.2.2022)

2022

  1. A. Kanervisto, V. Hautamäki, T. Kinnunen, J. Yamagishi, “Optimizing Tandem Speaker Verification and Anti-Spoofing Systems,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 477-488, 2022.
  2. R. Tao, K.A. Lee, R.K. Das, V. Hautamäki, H.  Li, “Self-supervised Speaker Recognition with Loss-gated Learning“, accepted to IEEE ICASSP, Singapore, 2022.
  3. X. Liu, M. Sahidullah, T. Kinnunen, “Learnable Nonlinear Compression for Robust Speaker Verification“, accepted to IEEE ICASSP, Singapore, 2022.

2021

  1. Y. Ma, K.A. Lee, V.  Hautamäki, H.  Li, “PL-EERSR: Perceptual Loss Based End-to-End Robust Speaker Representation Extraction“, to appear in IEEE Automatic Speech Recognition and Understanding workshop, 2021
  2. K. Hechmi, T.N. Trong, V. Hautamäki, T. Kinnunen, ”VoxCeleb Enrichment for Age and Gender Recognition”, to appear in IEEE Automatic Speech Recognition and Understanding workshop, 2021
  3. X. Liu, M. Sahidullah, T. Kinnunen, “Optimized Power Normalized Cepstral Coefficients towards Robust Deep Speaker Verification”, to appear in IEEE Automatic Speech Recognition and Understanding workshop, 2021
  4. X. Liu, M. Sahidullah, T. Kinnunen, “Parameterized Channel Normalization for Far-field Deep Speaker Verification”, to appear in IEEE Automatic Speech Recognition and Understanding workshop, 2021
  5. X. Liu, M. Sahidullah, T. Kinnunen, “Optimizing Multi-Taper Features for Deep Speaker Verification”, IEEE Signal Processing Letters, 28: 2187–2191, October 2021.
  6. L. Tavi, T. Kinnunen, E. Meister, R González-Hautamäki, A. Malmi, ”Articulation During Voice Disguise: A Pilot Study”, Proc. Speech and Computer (SPECOM’21), Springer LNAI 12997, pp. 680–691, St. Petersburg, Russia, September 2021.
  7. T. Kinnunen, A. Nautsch, M. Sahidullah, N. Evans, X. Wang, M. Todisco, H. Delgado, J. Yamagishi, K.A. Lee, “Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing”, Proc. Interspeech, 4299-4303, Brno, Czech Republic, 2021.
  8. A. Kanervisto, C. Scheller, Y. Schraner and V. Hautamäki, “Distilling Reinforcement Learning Tricks for Video Games“, IEEE Conference on Games, Virtual, 2021.
  9. B. Chettri, R. González Hautamäki, M. Sahidullah, T. Kinnunen, “Data Quality as Predictor of Voice Anti-Spoofing Generalization”, Proc. Interspeech, 1659-1663, Brno, Czech Republic, 2021.
  10. J. Turkia, L. Mehtätalo, U. Schwab, and V. Hautamäki, “Mixed-Effect Bayesian Network Reveals Personal Effects of Nutrition“, Scientific Reports, Vol. 11, No. 12016, 2021.
  11. K.A. Lee, V. Vestman, and T. Kinnunen, “ASVtorch Toolkit: Speaker Verification with Deep Neural Networks”, SoftwareX, Volume 14, 100697, June 2021.
  12. K. Ishihara, A. Kanervisto, J.  Miura and V. Hautamäki, “Multi-task Learning with Attention for End-to-end Autonomous Driving“, CVPR 2021 Workshop on Autonomous Driving, 2021.
  13. X. Liu, M. Sahidullah, T. Kinnunen, “Learnable MFCCs for Speaker Verification”, Proc. IEEE Int. Symp. Circuits and Systems (ISCAS 2021), Daegu, Korea, May 2021
  14. A. Nautsch, X. Wang, N. Evans, T. Kinnunen, V. Vestman, M. Todisco, H. Delgado, M. Sahidullah, J. Yamagishi, K.A. Lee, “ASVspoof 2019: Spoofing Countermeasures for the Detection of Synthesized, Converted and Replayed Speech”, IEEE Transactions on Biometrics, Behavior, and Identity Science, 3(2): 252–265, April 2021
  15. M. Sahidullah, A.K. Sarkar, V. Vestman, X. Liu, R. Serizel, T. Kinnunen, Z.-H. Tan, E. Vincent, “UIAI System for Short-Duration Speaker Verification Challenge 2020”, Proc. IEEE Spoken Language Technology Workshop (SLT 2021), Shenzhen, China, January 2021

2020

  1. R. K. Das, T. Kinnunen, W.-C. Huang, Z. Ling, J. Yamagishi, Y. Zhao, X. Tian, T. Toda, “Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions”, Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge, pp. 99–120, 2020.
  2. Y. Zhao, W.-C. Huang, X. Tian, J. Yamagishi, R.K. Das, T. Kinnunen, Z. Ling, T. Toda, “Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion”, Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge, pp. 80–98, 2020.
  3. A. Sholokhov, T. Kinnunen, V. Vestman, K.A. Lee, “Extrapolating False Alarm Rates in Automatic Speaker Verification”, Proc. Interspeech 2020, pp. 4218–4222, Shanghai, China, October 2020
  4. R. K. Das, X. Tian, T. Kinnunen, H. Li, “The Attacker’s Perspective on Automatic Speaker Verification: An Overview”, Proc. Interspeech 2020, pp. 4213–4217, Shanghai, China, October 2020
  5. R. González Hautamäki and T. Kinnunen, “Why Did the x-Vector System Miss a Target Speaker? Impact of Acoustic Mismatch Upon Target Score on VoxCeleb Data”, Proc. Interspeech 2020, pp. 4313–4317, Shanghai, China, October 2020
  6. X. Liu, M. Sahidullah, T. Kinnunen, “A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings”, Proc. Interspeech 2020, pp. 3221–3225, Shanghai, China, October 2020
  7. T. Kinnunen, H. Delgado, N. Evans, K.A. Lee, V. Vestman, A. Nautsch, M. Todisco, X. Wang, M. Sahidullah, J. Yamagishi, D.A. Reynolds, “Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals”, IEEE/ACM Transactions on Audio, Speech, and Language Processing
  8. I. Kukanov, TN. Trong, V. Hautamäki, SM. Siniscalchi, VM. Salerno, KA. Lee.  “Maximal figure-of-merit framework to detect multi-label phonetic features for spoken language recognitionIEEE/ACM transactions on audio, speech, and language processing 28: 682-695. 2020
  9. A. Kanervisto,  C. Scheller, V. Hautamäki, “Action Space Shaping in Deep Reinforcement Learning“, IEEE Conference on Games 2020
  10. A. Kanervisto, J. Pussinen, Ville Hautamäki, “Benchmarking End-to-End Behavioural Cloning on Video Games“, IEEE Conference on Games 2020
  11. X. Wang, J. Yamagishi, M. Todisco, H. Delgado, A. Nautsch, N. Evans, M. Sahidullah, V. Vestman, T. Kinnunen, K.A. Lee, L. Juvela, P. Alku, Y.-H. Peng, H.-T. Hwang, Y. Tsao, H.-M. Wang, S. L. Maguer, M. Becker, F. Henderson, R. Clark, Y. Zhang, Q. Wang, Y. Jia, K. Onuma, K. Mushika, T. Kaneda, Y. Jiang, L.-J. Liu, Y.-C. Wu, W.-C. Huang, T. Toda, K. Tanaka, H. Kameoka, I. Steiner, D. Matrouf, J. -F. Bonastre, A. Govender, S. Ronanki, J.-X. Zhang, Z.-H. Ling, “ASVspoof 2019: a large-scale public database of synthetic, converted and replayed speech”, Computer Speech & Language, 64, November 2020
  12. A. Kanervisto, J. Karttunen, V. Hautamäki,Playing Minecraft with Behavioural Cloning“, PMLR post proceedings – Competition Track@NeurIPS2019, 2020
  13. B. Chettri, T. Kinnunen, E. Benetos, “Deep Generative Variational Autoencoding for Replay Spoof Detection in Automatic Speaker Verification”, Computer Speech & Language, 63: 1–18, September 2020
  14. V. Vestman, K.A. Lee, T. Kinnunen, “Neural i-Vectors”, Proc. Odyssey 2020, pp. 67–74, Tokyo, Japan, Nov. 2020
  15. B. Chettri, T. Kinnunen, E. Benetos, “Subband modeling for Spoofing Detection in Automatic Speaker Verification”, Proc. Odyssey 2020, pp. 341–348, Tokyo, Japan, Nov. 2020
  16. A. Kanervisto, V. Hautamäki, T. Kinnunen, J. Yamagishi, “An Initial Investigation on Optimizing Tandem Speaker Verification and Countermeasure Systems Using Reinforcement Learning”, Proc. Odyssey 2020, pp. 151–158, Tokyo, Japan, Nov. 2020
  17. J. Karttunen, A. Kanervisto, V. Kyrki, Ville Hautamäki, “From Video Game to Real Robot: The Transfer Between Action Spaces“, IEEE ICASSP, Virtual conference, May 2020
  18. A. Sholokhov, T. Kinnunen, V. Vestman, K.A. Lee, “Voice Biometrics Security: Extrapolating False Alarm Rate via Hierarchical Bayesian Modeling of Speaker Verification Scores”, Computer Speech & Language, 60: 1–19, March 2020

2019

  1. R. González Hautamäki and T. Kinnunen, “Towards Controlling False Alarm — Miss Trade-Off in Perceptual Speaker Comparison via Non-Neutral Listening Task Framing”, Proc. IEEE ASRU, December 2019, Singapore
  2. A. Sholokhov, T. Kinnunen, V. Vestman, K.A. Lee, “Voice Biometrics Security: Extrapolating False Alarm Rate via Hierarchical Bayesian Modeling of Speaker Verification Scores”, Computer Speech & Language, 60: 1–19, March 2020
  3. A. Kato and T. Kinnunen, “Statistical Regression Models for Noise Robust F0 Estimation Using Recurrent Deep Neural Networks”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(2):2336–2349, December 2019.
  4. R. González Hautamäki, V. Hautamäki, T. Kinnunen, “On Limits of Automatic Speaker Verification: Explaining Degraded Recognizer Score Through Acoustic Changes Resulting from Voice Disguise”, Journal of the Acoustic Society of America, 146(1): 693–704, July 2019
  5. V. Vestman, T. Kinnunen, R. Gonzalez Hautamäki, M. Sahidullah, “Voice Mimicry Attacks Assisted by Automatic Speaker Verification”, Computer Speech & Language, 59: 36–54, January 2020.
  6. V. Vestman, K. A. Lee, T. Kinnunen, T. Koshinaka, “Unleashing the Unused Potential of I-Vectors Enabled by GPU Acceleration”, Proc. Interspeech 2019, pp. 351–355, Graz, Austria, September 2019
  7. M. Todisco, X. Wang, V. Vestman, M. Sahidullah, H. Delgado, A. Nautsch, J. Yamagishi, N. Evans, T. Kinnunen, K. A. Lee, “ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection”,  Proc. Interspeech 2019, pp. 1008–1012, Graz, Austria, September 2019
  8. Bilal Soomro, Anssi Kanervisto, Trung Ngo Trong and Ville Hautamäki, “Towards Debugging Deep Neural Networks by Generating Speech Utterances“, Proc. Interspeech 2019,,  pp.  3213-3217, Graz, Austria, September 2019. Github
  9. K. A. Lee, V. Hautamäki, T. Kinnunen, H. Yamamoto, K. Okabe, V. Vestman, J. Huang, G. Ding, H. Sun, A. Larcher, R. K. Das, H. Li, M. Rouvier, P. Bousquet, W. Rao, Q. Wang, C. Zhang, F. Bahmaninezhad, H. Delgado, M. Todisco, Q. Wang, L. Guo, T. Koshinaka, J. Zhang, K. Shinoda, T. N. Trong, M. Sahidullah, F. Lu, Y. Tang, M. Tu, K. K. Teh, H. D. Tran, K. K. George, I. Kukanov, F. Desnous, J. Yang, E. Yılmaz, L. Xu, J. Bonastre, C. Xu, Z. H. Lim, E. S. Chng, S. Ranjan, J. H. L. Hansen, J. Patino, N. Evans, “I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences”, Proc. Interspeech 2019, pp. 1497–1501, Graz, Austria, September 2019
  10. X. Wu, E. Granger, T. Kinnunen, X. Feng, A. Hadid, “Audio-Visual Kinship Verification in the Wild”,  12th IAPR International Conference On Biometrics (ICB 2019), Crete, Greece, June 2019. [PDF]
  11. T. Kinnunen, R. González Hautamäki, V. Vestman, M. Sahidullah, “Can We Use Speaker Recognition Technology to Attack Itself? Enhancing Mimicry Attacks Using Automatic Target Speaker Selection”, Proc. IEEE ICASSP, pp. 6146–6150, Brighton, UK, May 2019 [PDF]
  12. V. Vestman, B. Soomro, A. Kanervisto, V. Hautamäki, T. Kinnunen, “Who Do I sound Like? Showcasing Speaker Recognition Technology by YouTube Voice Search”, Proc. IEEE ICASSP, pp. 5781–5785, Brighton, UK, May 2019 [PDF]
  13. E. Jokinen, R. Saeidi, T. Kinnunen, P. Alku, “Vocal Effort Compensation for MFCC Feature Extraction in a Shouted Versus Normal Speaker Recognition Task”, Computer Speech & Language, 53: 1-11, January 2019 IF=1.776 JF=

2018

  1. M. Sahidullah, H. Delgado, M. Todisco, T. Kinnunen, N. Evans, J. Yamagishi, K.A. Lee, “Introduction to Voice Presentation Attack Detection and Recent Advances”, book chapter in Handbook of Biometric Anti-Spoofing: Presentation Attack Detection, Springer, S. Marcel, M.S. Nixon, J. Fierrez, N. Evans (Eds.), Springer, 2018 [PDF]
  2. F. Fang, J. Yamagishi, I. Echizen, M. Sahidullah, T. Kinnunen, “Transforming Acoustic Characteristics to Deceive Playback Spoofing Countermeasures of Speaker Verification Systems”, Proc IEEE Int. Workshop on Information Forensics and Security (WIFS 2018), Hong Kong, China, 2018 [PDF]
  3. M. Todisco, H. Delgado, K.A. Lee, M. Sahidullah, N. Evans, T. Kinnunen, J. Yamagishi, “Integrated Presentation Attack Detection and Automatic Speaker Verification: Common Features and Gaussian Back-end Fusion”, Proc. Interspeech 2018, pp. 77-81, Hyderabad, India, September 2018 [PDF] JF=1
  4. A. Kato, T. Kinnunen, “Waveform to Single Sinusoid Regression to Estimate the F0 Contour from Noisy Speech Using Recurrent Deep Neural Networks”, Proc. Interspeech 2018, pp. 327-331, Hyderabad, India, September 2018 [PDF] JF=1
  5. S. Sieranoja, M. Sahidullah, T. Kinnunen, J. Komulainen, A. Hadid, “Audiovisual Synchrony Detection with Optimized Audio Features”, accepted to IEEE 3rd Int. Conference on Signal and Image Processing (ICSIP 2018), Shenzhen, China, July 2018 [PDF] JF=0
  6. T. N. Trong, V. Hautamäki, and K. Jokinen, “Staircase Network: structural language identification via hierarchical attentive units“, Proc. Odyssey 2018, pp. 60-67, Les Sables d’Olonne, France, June 2018 JF=1
  7. T. Kinnunen, K.A. Lee, H. Delgado, N. Evans, M. Todisco, M. Sahidullah, J. Yamagishi, D.A. Reynolds,  “t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification”, Proc. Odyssey 2018, pp. 312-319, Les Sables d’Olonne, France, June 2018 [PDF] JF=1
  8. T. Kinnunen, J. Lorenzo-Trueba, J. Yamagishi, T. Toda, D. Saito, F. Villavicencio, Z. Ling, “A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment”, Proc. Odyssey 2018, pp. 187-194, Les Sables d’Olonne, France, June 2018 [PDF (original)], [PDF (corrected version) from arXiv with a bug fix)] JF=1
  9. J. Lorenzo-Trueba, J. Yamagishi, T. Toda, D. Saito, F. Villavicencio, T. Kinnunen, Z. Ling, “The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods”, Proc. Odyssey 2018, pp. 195-202, Les Sables d’Olonne, France, June 2018 [PDF] [The data and challenge results are available here] JF=1
  10. V. Vestman and T. Kinnunen, “Supervector Compression Strategies to Speed up I-Vector System Development”,  Proc. Odyssey 2018, pp. 357-364, Les Sables d’Olonne, France, June 2018 [PDF] JF=1
  11. R. Gonzalez Hautamäki, A. Kanervisto, V. Hautamäki, T. Kinnunen, “Perceptual Evaluation of the Effectiveness of Voice Disguise by Age Modification”,  Proc. Odyssey 2018, pp. 320-326, Les Sables d’Olonne, France, June 2018 [PDF] JF=1
  12. A. Kato and T. Kinnunen, “A Regression Model of Recurrent Deep Neural Networks for Noise Robust Estimation of the Fundamental Frequency Contour of Speech”, Proc. Odyssey 2018, pp. 275-282, Les Sables d’Olonne, France, June 2018 [PDF] JF=1
  13. J. Lorenzo-Trueba, F. Fang, X. Wang, I. Echizen, J. Yamagishi, T. Kinnunen, “Can we steal your vocal identity from the Internet? Initial investigation of cloning Obama’s voice using GAN, WaveNet and low-quality found data”, Proc. Odyssey 2018, pp. 240-247, Les Sables d’Olonne, France, June 2018 [PDF] JF=1
  14. H. Delgado, M. Todisco, M. Sahidullah, N. Evans, T. Kinnunen, K.A. Lee, J. Yamagishi, “ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements”,  Proc. Odyssey 2018, pp. 296-303, Les Sables d’Olonne, France, June 2018 [PDF] JF=1
  15. T. Leppänen, H. Vrzakova, R. Bednarik, A. Kanervisto, A. Elomaa, A. Huotarinen, P. Bartczak, M. Fraunberg, J. Jääskeläinen, “Augmenting Microsurgical Training: Microsurgical Instrument Detection Using Convolutional Neural Networks”, Proc. CBMS 2018, pp. 211-216, Karlstad, June 2018 JF=1
  16. T. N. Trong, K. Jokinen and V. Hautamäki, “Enabling Spoken Dialgoue Systems for low-resourced languages – end-to-end dialect recognition for North Sami“, IWSDS 2018, Singapore, May 2018 [Best paper award]
  17. I. Kukanov, V. Hautamäki and Kong Aik Lee, “Maximal Figure-of-Merit Embedding for Multi-label Audio Classification“,  Proc. ICASSP 2018, pp. 136-140, Calgary, Canada, April 2018 JF=1
  18. V. Vestman, D. Gowda, M. Sahidullah, P. Alku, and T. Kinnunen, “Speaker Recognition from Whispered Speech: a Tutorial Survey and an Application of Time-Varying Linear Prediction”, Speech Communication, 99: 62-79, May 2018 IF=1.585 JF=2
  19. M. Sahidullah, D. Thomsen, R. Gonzalez Hautamäki, T. Kinnunen, Z.-H. Tan, R. Parts, and Martti Pitkänen, “Robust Voice Liveness Detection and Speaker Verification Using Throat Microphones”, IEEE/ACM Trans. on Audio, Speech, and Language Processing, 26(1): 44-56, January 2018 [PDF] IF=2.95 JF=2
  20. A. Sholokhov, M. Sahidullah, T. Kinnunen, “Semi-Supervised Speech Activity Detection with an Application to Automatic Speaker Verification”, Computer Speech & Language, 47:132-156, January 2018 IF=1.90 JF=2

 

Group has separated from machine learning group on 2018. Articles published before 2018 can be found here.