INDEX
Explanations
mentions of rodents, particularly rats
mentions of rats or related terms associated with them
New Auto-Interp
Negative Logits
Ͻ
-0.77
»Ĵ
-0.71
alach
-0.69
ģĸ
-0.68
qua
-0.68
ãĥĨãĤ£
-0.66
verson
-0.65
¬¼
-0.65
Adviser
-0.62
xtap
-0.62
POSITIVE LOGITS
chet
1.07
holes
0.97
fish
0.87
dies
0.87
lings
0.86
dog
0.85
ox
0.83
che
0.80
Rats
0.78
cat
0.78
Activations Density 0.007%