INDEX
Explanations
sentences related to philosophical and abstract discussions
references to subjective evaluations or criticisms of artistic works
New Auto-Interp
Negative Logits
Veterans
-0.60
Flam
-0.58
Adren
-0.56
Nepal
-0.56
Wildlife
-0.55
Dri
-0.55
Globe
-0.54
Sang
-0.53
Lunar
-0.53
Deng
-0.53
POSITIVE LOGITS
)).
0.84
?).
0.83
existed
0.75
infall
0.74
totality
0.73
oneself
0.72
objectively
0.72
actual
0.70
unlaw
0.70
?".
0.69
Activations Density 1.353%