INDEX
Explanations
expressions describing personal experiences or accomplishments
expressions of significant personal experiences and emotions
New Auto-Interp
Negative Logits
pmwiki
-0.91
krit
-0.78
dispute
-0.77
inconsist
-0.75
derog
-0.72
Dialog
-0.70
defic
-0.69
plaus
-0.69
disputes
-0.68
farious
-0.68
POSITIVE LOGITS
Seeing
1.09
witnessing
1.09
seeing
1.04
Seeing
1.03
watching
1.03
cheering
1.02
Watching
0.98
THANK
0.98
knowing
0.97
watching
0.95
Activations Density 0.426%