INDEX
Explanations
mentions of monkeys
references to monkeys
New Auto-Interp
Negative Logits
Lauder
-0.83
inen
-0.82
reek
-0.78
sburgh
-0.77
EMP
-0.75
oppable
-0.73
encer
-0.71
encers
-0.71
OHN
-0.70
aci
-0.69
POSITIVE LOGITS
pox
0.99
wrench
0.90
sey
0.87
patch
0.82
ãĥ£
0.79
oleon
0.76
bitten
0.73
monkeys
0.72
zee
0.70
monkey
0.70
Activations Density 0.021%