INDEX
Explanations
emotional reactions or opinions expressed in written text
expressions of disappointment or dissatisfaction
New Auto-Interp
Negative Logits
Mirage
-0.65
pyramid
-0.62
accomp
-0.62
shadow
-0.61
specialist
-0.61
crus
-0.60
Camel
-0.60
Amon
-0.59
presumed
-0.59
trainer
-0.59
POSITIVE LOGITS
onto
1.01
ï¸ı
0.99
ationally
0.95
gently
0.94
him
0.93
efe
0.89
against
0.89
early
0.87
selves
0.85
oward
0.84
Activations Density 0.243%