INDEX
Explanations
countries, organizations, and specific actions or situations
phrases indicating restrictions or limitations
New Auto-Interp
Negative Logits
Seym
-0.76
Vaugh
-0.67
thous
-0.62
ãĤ´ãĥ³
-0.61
mathemat
-0.56
nodd
-0.54
Kardash
-0.53
Math
-0.53
anwhile
-0.52
edIn
-0.52
POSITIVE LOGITS
geon
0.65
hee
0.60
ree
0.59
brew
0.54
Ĥİ
0.53
Ī
0.52
Ĥª
0.51
TOR
0.50
ndra
0.50
aves
0.49
Activations Density 0.611%