INDEX
Explanations
instances where someone publicly criticizes or points out something or someone
phrases emphasizing calling someone out or criticizing actions
New Auto-Interp
Negative Logits
gamble
-0.67
princip
-0.66
fing
-0.63
oreal
-0.62
entry
-0.61
mint
-0.61
assum
-0.59
parable
-0.59
confir
-0.58
ushima
-0.58
POSITIVE LOGITS
stretched
0.98
loud
0.97
casts
0.82
loudly
0.74
posts
0.74
Sinclair
0.72
tical
0.70
lier
0.70
tics
0.70
smart
0.69
Activations Density 0.018%