INDEX
Explanations
words in a non-Latin script, possibly Cyrillic
characters or symbols, potentially indicating special or foreign textual elements
New Auto-Interp
Negative Logits
manif
-0.67
dolls
-0.65
Riley
-0.65
conduc
-0.64
temptation
-0.64
visitation
-0.64
ktop
-0.64
bourg
-0.64
enegger
-0.61
caravan
-0.61
POSITIVE LOGITS
оÐ
1.08
е
1.04
ÑĢ
1.04
¬
1.04
Į
1.03
°
1.02
Ĺ
1.00
ĺ
1.00
Ñĥ
0.99
¹
0.99
Activations Density 0.051%