INDEX
Explanations
manipulative traits and actions in characters or individuals
New Auto-Interp
Negative Logits
basicConfig
-0.40
krankungen
-0.38
criptive
-0.38
spazz
-0.37
gesteld
-0.36
qual
-0.36
radora
-0.36
verantwoorde
-0.36
quality
-0.35
stanz
-0.35
POSITIVE LOGITS
trick
1.05
cunning
0.95
tricks
0.93
manipulate
0.81
tricks
0.81
Trick
0.80
Tricks
0.79
sneaky
0.79
manipulation
0.78
Trick
0.78
Activations Density 0.356%