INDEX
Explanations
words related to negative actions or characteristics, specifically focusing on petty behavior
references to trivial or minor offenses
New Auto-Interp
Negative Logits
Downloadha
-0.89
ahead
-0.78
hov
-0.77
ioch
-0.76
hetti
-0.76
ources
-0.76
avez
-0.74
heimer
-0.73
Recomm
-0.73
igslist
-0.72
POSITIVE LOGITS
petty
0.91
cipled
0.79
theft
0.77
arithmetic
0.72
Petty
0.72
misdemeanor
0.72
provocation
0.67
tru
0.66
fel
0.66
Theft
0.66
Activations Density 0.014%