INDEX
Explanations
instances of the word "good."
New Auto-Interp
Negative Logits
alers
-0.18
адÑĥ
-0.16
oley
-0.15
adero
-0.15
ville
-0.15
touched
-0.15
touch
-0.15
drs
-0.15
linger
-0.15
vag
-0.14
POSITIVE LOGITS
tember
0.16
tons
0.15
//{{0.15
ton
0.15
ector
0.15
ervo
0.14
utar
0.14
à¸łà¸²à¸¢à¹ĥà¸Ļ
0.14
avr
0.14
ÑĭÑĤ
0.14
Activations Density 0.017%