INDEX
Explanations
words like "admit", "confess", and "say" indicating acknowledgment or disclosure
expressions of acknowledgment or admission of personal feelings or actions
New Auto-Interp
Negative Logits
decom
-0.71
locked
-0.66
ishable
-0.64
let
-0.61
stall
-0.61
lockdown
-0.59
lets
-0.57
arent
-0.57
scrimmage
-0.57
resur
-0.57
POSITIVE LOGITS
anecd
0.77
hindsight
0.76
eno
0.74
fav
0.68
admit
0.68
vv
0.63
lers
0.62
olson
0.61
erning
0.61
Uri
0.61
Activations Density 0.052%