INDEX
Explanations
the word "gorilla" appearing in various contexts
variations of the word "gorilla."
New Auto-Interp
Negative Logits
iosyn
-0.83
ply
-0.83
paren
-0.76
orough
-0.75
leness
-0.75
matically
-0.74
den
-0.73
recomm
-0.72
log
-0.72
spot
-0.72
POSITIVE LOGITS
ieri
0.87
Haram
0.84
istas
0.79
ength
0.78
Ammunition
0.76
esi
0.75
Rica
0.73
illas
0.69
izers
0.69
Gomez
0.68
Activations Density 0.021%