INDEX
Explanations
mentions of specific names or terms, like "ron" and "ape"
proper nouns and significant numerical values related to organizations or products
New Auto-Interp
Negative Logits
ĸļ
-1.09
YA
-0.92
¥µ
-0.92
GoldMagikarp
-0.88
yah
-0.87
roxy
-0.85
»Ĵ
-0.81
Parables
-0.79
YN
-0.78
yrinth
-0.77
POSITIVE LOGITS
co
0.90
^
0.87
CO
0.83
Ange
0.77
CO
0.75
ãĤ¬
0.75
^
0.70
oblig
0.69
Hoff
0.69
0.68
Activations Density 0.297%