INDEX
Explanations
mentions of or related to the word "gorilla"
the term "gorilla."
New Auto-Interp
Negative Logits
paren
-0.85
ply
-0.77
displayText
-0.74
susp
-0.74
cards
-0.73
schild
-0.72
iosyn
-0.72
demand
-0.71
externalActionCode
-0.69
recomm
-0.69
POSITIVE LOGITS
ieri
0.94
illa
0.84
istas
0.82
Haram
0.82
illas
0.80
ength
0.78
ño
0.77
umin
0.75
ignt
0.74
Ammunition
0.74
Activations Density 0.009%