INDEX
Explanations
words related to criticism or negative evaluation
terms related to significant consequences or effects
New Auto-Interp
Negative Logits
Awakens
-0.68
Kardash
-0.66
Masquerade
-0.57
fitt
-0.54
trusts
-0.52
didnt
-0.51
Kenn
-0.51
retrie
-0.50
Wes
-0.49
Mak
-0.49
POSITIVE LOGITS
lie
0.72
maxwell
0.68
ieu
0.68
JECT
0.62
onite
0.61
olina
0.61
ril
0.61
oton
0.59
acus
0.58
inguished
0.58
Activations Density 1.571%