INDEX
Explanations
phrases related to personal qualities and characteristics
statements related to emotional or psychological observations
New Auto-Interp
Negative Logits
swick
-0.88
asio
-0.70
ESE
-0.69
wright
-0.69
afety
-0.69
hers
-0.66
SHIP
-0.64
stanbul
-0.64
bia
-0.62
thora
-0.62
POSITIVE LOGITS
albeit
0.92
gradient
0.85
etc
0.76
encomp
0.72
economical
0.71
huh
0.69
sounding
0.69
minded
0.68
itar
0.64
contro
0.64
Activations Density 0.289%