INDEX
Explanations
proper nouns, particularly names and titles
New Auto-Interp
Negative Logits
orama
-0.21
opi
-0.17
itsu
-0.15
fram
-0.15
fst
-0.15
emax
-0.15
illum
-0.15
vanished
-0.15
opak
-0.15
enerator
-0.15
POSITIVE LOGITS
gard
0.15
indeed
0.15
ãĥĥãĤ«ãĥ¼
0.15
Tent
0.15
Couch
0.14
ÑĦиÑĨи
0.14
-fi
0.14
931
0.14
ault
0.14
respectively
0.14
Activations Density 0.080%