INDEX
Explanations
references to the movie "Zootopia" or related terms
New Auto-Interp
Negative Logits
MIT
-0.70
exha
-0.68
galvan
-0.64
ĻĤ
-0.64
misunder
-0.63
agon
-0.62
conduc
-0.62
SER
-0.62
cov
-0.60
BER
-0.60
POSITIVE LOGITS
strap
1.18
hing
1.15
oot
1.14
hed
1.09
stra
1.02
sie
1.01
opia
1.01
ypes
0.99
ropolis
0.97
eers
0.95
Activations Density 0.016%