INDEX
Explanations
truth-related statements
references to truth and subjective opinions
New Auto-Interp
Negative Logits
obook
-0.70
Mehran
-0.63
etsk
-0.63
riers
-0.63
oplan
-0.57
naires
-0.57
Banner
-0.56
({-0.56
enance
-0.55
Skydragon
-0.53
POSITIVE LOGITS
ya
0.97
arently
0.96
wise
0.95
ortunately
0.93
cially
0.90
identally
0.90
caveat
0.90
wise
0.89
pecially
0.88
nown
0.86
Activations Density 0.214%