INDEX
Explanations
special characters such as quotes and parentheses
quotation marks or references to dialogue
New Auto-Interp
Negative Logits
Nieto
-0.79
offended
-0.76
Klu
-0.73
synd
-0.72
cancell
-0.71
predomin
-0.71
Tribune
-0.70
retribution
-0.70
anim
-0.69
constitu
-0.69
POSITIVE LOGITS
normal
1.11
cheat
1.11
false
1.08
true
1.08
official
1.06
Hey
1.04
BuyableInstoreAndOnline
1.02
catentry
1.02
Hello
1.01
Empty
1.01
Activations Density 0.108%