INDEX
Explanations
words indicating severity or intensity of issues and problems
New Auto-Interp
Negative Logits
wick
-0.18
cales
-0.17
orp
-0.16
alon
-0.15
ukt
-0.14
chn
-0.14
wers
-0.14
å¯Ł
-0.14
edo
-0.14
acker
-0.14
POSITIVE LOGITS
ocaly
0.16
TRL
0.15
itas
0.15
hani
0.15
-league
0.15
rogate
0.14
ç¦
0.14
urai
0.14
metics
0.14
addock
0.14
Activations Density 0.013%