INDEX
Explanations
mentions of a specific symbol or character consistently
words related to legal or authoritative statements
New Auto-Interp
Negative Logits
izen
-0.70
jog
-0.69
chnology
-0.68
puff
-0.66
agall
-0.64
berman
-0.64
snail
-0.63
habit
-0.60
ponder
-0.59
misunder
-0.59
POSITIVE LOGITS
said
1.06
ï¸ı
0.95
cause
0.91
laugh
0.83
yet
0.83
tra
0.81
cue
0.80
mr
0.80
except
0.80
dn
0.79
Activations Density 0.166%