INDEX
Explanations
mentions of specific words
specific observational statements or claims
New Auto-Interp
Negative Logits
NL
-0.68
GA
-0.65
haun
-0.63
illed
-0.61
agog
-0.61
CrossRef
-0.60
rams
-0.60
rique
-0.58
alon
-0.58
egu
-0.58
POSITIVE LOGITS
ournal
0.78
piracy
0.76
leased
0.73
ercise
0.70
omsky
0.69
argon
0.68
estate
0.67
Material
0.66
ortium
0.66
eBook
0.66
Activations Density 0.000%