INDEX
Explanations
words related to storytelling and personal anecdotes
conversational expressions and social interactions
New Auto-Interp
Negative Logits
士
-0.81
£ı
-0.78
vre
-0.75
unal
-0.68
Flavoring
-0.67
Perhaps
-0.65
Enough
-0.64
ufficient
-0.64
Updated
-0.63
safegu
-0.63
POSITIVE LOGITS
kind
0.99
uh
0.95
kinda
0.93
laughing
0.90
['
0.89
fuckin
0.87
saying
0.86
yelling
0.85
[
0.85
grinning
0.84
Activations Density 0.414%