INDEX
Explanations
expressions of strong affirmation or denial
New Auto-Interp
Negative Logits
gid
-0.16
age
-0.16
egr
-0.15
ei
-0.15
gomery
-0.15
Ïĩα
-0.14
reh
-0.14
ej
-0.14
ebb
-0.14
AGE
-0.14
POSITIVE LOGITS
Frid
0.16
ippers
0.15
Barber
0.15
itchen
0.15
forcements
0.14
isky
0.14
opor
0.14
_nth
0.14
lear
0.14
ÑĤик
0.14
Activations Density 0.105%