INDEX
Explanations
negations or expressions of lack or absence
New Auto-Interp
Negative Logits
Durant
-0.16
ilden
-0.15
Thornton
-0.15
anning
-0.14
anford
-0.14
enburg
-0.14
úc
-0.14
Commerce
-0.13
Chapman
-0.13
ario
-0.13
POSITIVE LOGITS
rame
0.16
izr
0.15
abal
0.15
доÑģÑĤ
0.15
алеж
0.15
acker
0.14
æĤ
0.14
747
0.14
ksam
0.14
linky
0.14
Activations Density 0.003%