INDEX
Explanations
mentions of gorillas
references to gorillas
New Auto-Interp
Negative Logits
broadcast
-0.73
titanium
-0.63
tele
-0.62
Speaker
-0.62
prop
-0.62
release
-0.61
ab
-0.61
TD
-0.61
examination
-0.60
refund
-0.60
POSITIVE LOGITS
illas
4.40
illa
1.59
anches
1.26
illes
1.20
adas
1.04
ierrez
1.03
antes
1.02
cules
1.02
alez
1.02
illi
0.99
Activations Density 0.012%