INDEX
Explanations
phrases related to decision-making or introspection
references to personal relationships and social interactions
New Auto-Interp
Negative Logits
eree
-0.53
natureconservancy
-0.52
onymous
-0.51
oplan
-0.51
utterly
-0.50
Lens
-0.50
ographically
-0.50
RGB
-0.48
azing
-0.48
virt
-0.47
POSITIVE LOGITS
Ago
0.67
ago
0.64
yesterday
0.63
nesday
0.62
bc
0.62
confir
0.59
indicating
0.58
Asked
0.56
towards
0.56
inquired
0.55
Activations Density 1.065%