INDEX
Explanations
references to specific items and projects such as games, movies, and tournaments
New Auto-Interp
Negative Logits
gerald
-0.71
awaru
-0.70
otle
-0.68
ured
-0.66
manship
-0.64
iership
-0.63
Ik
-0.63
urated
-0.63
rolet
-0.62
etz
-0.62
POSITIVE LOGITS
nd
2.14
ND
1.19
thirds
1.06
133
0.99
147
0.98
160
0.95
externalToEVAOnly
0.93
187
0.90
halves
0.89
245
0.88
Activations Density 0.671%