INDEX
Explanations
expressions of understanding or rationale regarding situations and attitudes
New Auto-Interp
Negative Logits
chin
-0.83
raviolet
-0.80
beam
-0.79
erial
-0.77
alion
-0.76
etry
-0.76
eng
-0.74
roots
-0.74
esh
-0.73
rays
-0.70
POSITIVE LOGITS
why
0.83
Mellon
0.83
frustration
0.76
curiosity
0.74
understandable
0.74
skepticism
0.71
frustrations
0.71
impat
0.71
annoyance
0.70
altru
0.69
Activations Density 0.015%