INDEX
Explanations
statements expressing personal opinions or thoughts
New Auto-Interp
Negative Logits
emailer
-0.13
bulundu
-0.13
ÙıÙĪØ§
-0.12
ãģ®ãģ¯
-0.12
ãģĵãģ¨ãģ§
-0.12
.pretty
-0.12
ellig
-0.12
anken
-0.12
ãĥ¼ãĥ«
-0.12
пÑĥ
-0.12
POSITIVE LOGITS
there
1.20
there
0.98
There
0.90
There
0.87
THERE
0.85
ÙĩÙĨاÙĥ
0.72
theres
0.60
dort
0.51
.There
0.44
"There
0.43
Activations Density 0.906%