INDEX
Explanations
phrases indicating potential conflicts of interest
New Auto-Interp
Negative Logits
off
-0.17
apa
-0.16
ichen
-0.15
اصÙĦÙĩ
-0.14
aved
-0.14
iline
-0.14
_throw
-0.14
æĹıèĩªæ²»
-0.14
ovsky
-0.14
cka
-0.14
POSITIVE LOGITS
interest
0.45
Interest
0.42
-interest
0.37
Interest
0.36
interest
0.36
_interest
0.32
interests
0.30
interes
0.28
инÑĤеÑĢеÑģ
0.26
interesse
0.25
Activations Density 0.003%