INDEX
Explanations
phrases indicating generalizations or qualifiers in statements
New Auto-Interp
Negative Logits
uder
-0.16
Jou
-0.15
eries
-0.15
oure
-0.14
uve
-0.14
Escort
-0.14
ÙĪØ§ÙĦس
-0.14
utory
-0.13
ево
-0.13
Fres
-0.13
POSITIVE LOGITS
contro
0.15
ambi
0.15
лага
0.14
istrovstvÃŃ
0.14
805
0.14
olen
0.13
as
0.13
Ïīμα
0.13
apart
0.13
contra
0.13
Activations Density 0.034%