INDEX
Explanations
discussions around societal issues, particularly related to accusations and how they impact individuals
New Auto-Interp
Negative Logits
poffe
-1.04
purpoſe
-1.04
pleaſure
-1.03
deſt
-0.97
houſe
-0.96
fubject
-0.95
ſever
-0.94
tranſ
-0.93
neceff
-0.93
himſelf
-0.92
POSITIVE LOGITS
whatnot
0.79
stuff
0.75
things
0.69
maybe
0.68
thingy
0.66
,
0.64
et
0.60
doings
0.60
Maybe
0.59
maybe
0.58
Activations Density 0.407%