INDEX
Explanations
references to personal relationships and social dynamics
New Auto-Interp
Negative Logits
ymoon
-0.17
nev
-0.15
šak
-0.15
мена
-0.15
edback
-0.14
Escort
-0.14
tvrt
-0.14
utsche
-0.14
quet
-0.14
bì
-0.14
POSITIVE LOGITS
174
0.18
239
0.15
233
0.15
ering
0.14
Lib
0.14
agen
0.14
çĬ¶
0.14
ht
0.14
Perm
0.14
ahr
0.13
Activations Density 0.077%