INDEX
Explanations
references to the word "gorilla" in the text
the word "gorilla" in various contexts
New Auto-Interp
Negative Logits
ply
-0.93
orough
-0.90
schild
-0.89
den
-0.81
iosyn
-0.80
log
-0.80
matically
-0.77
daq
-0.75
ensible
-0.75
race
-0.75
POSITIVE LOGITS
istas
0.83
Ammunition
0.83
ieri
0.81
esi
0.81
ño
0.79
Haram
0.76
Gomez
0.76
ength
0.75
ista
0.72
emonium
0.70
Activations Density 0.035%