INDEX
Explanations
words related to negative behaviors and dishonesty
terms related to deception, dishonesty, and disconnection
New Auto-Interp
Negative Logits
Reviewer
-0.93
Ear
-0.78
Hits
-0.73
Redditor
-0.65
DragonMagazine
-0.62
Mara
-0.62
Els
-0.62
Ital
-0.62
antioxid
-0.61
Enlarge
-0.61
POSITIVE LOGITS
liction
1.04
lement
0.96
cipline
0.90
ention
0.90
icating
0.90
legate
0.89
arer
0.89
isive
0.88
iencies
0.85
igmat
0.84
Activations Density 0.067%