INDEX
Explanations
the occurrence of the word "tw" and its variations, indicating a focus on social media references, particularly related to Twitter
New Auto-Interp
Negative Logits
vetica
-0.16
ung
-0.15
تاب
-0.15
jav
-0.15
оваÑĢ
-0.14
ÙĦÙĬÙĩ
-0.14
UNG
-0.14
hyp
-0.14
ÑĤеÑĢи
-0.14
istrovstvÃŃ
-0.14
POSITIVE LOGITS
viso
0.20
åĽ´
0.17
ór
0.15
ided
0.15
assi
0.15
nee
0.14
ìłĢ
0.14
etik
0.14
654
0.13
esor
0.13
Activations Density 0.012%