INDEX
Explanations
occurrences of the word "Gorilla" in various contexts
New Auto-Interp
Negative Logits
ined
-0.16
ikat
-0.16
concurrent
-0.15
839
-0.15
atern
-0.15
IED
-0.15
Sheridan
-0.15
spectral
-0.14
pekt
-0.14
боÑĢа
-0.14
POSITIVE LOGITS
illas
0.30
illa
0.28
ILLA
0.21
izia
0.21
ansson
0.19
ONTAL
0.19
izont
0.19
leston
0.19
ordo
0.18
untu
0.18
Activations Density 0.008%