INDEX
Explanations
repeated mentions of Alzheimer’s disease
New Auto-Interp
Negative Logits
onda
-0.18
onder
-0.17
WithIdentifier
-0.15
uras
-0.15
/part
-0.15
elah
-0.15
misc
-0.14
lessness
-0.14
Mile
-0.13
Rs
-0.13
POSITIVE LOGITS
ikk
0.17
-BEGIN
0.16
Victim
0.15
ehler
0.14
.wikipedia
0.14
наÑĩе
0.14
ÑĤеÑĢи
0.14
/config
0.14
Hawks
0.14
aby
0.14
Activations Density 0.002%