INDEX
Explanations
the word "thoughts"
references to personal thoughts or opinions
New Auto-Interp
Negative Logits
ALL
-0.77
Mamm
-0.69
Claim
-0.69
ARDS
-0.61
toe
-0.61
Ann
-0.58
Breach
-0.57
rake
-0.56
ards
-0.56
announced
-0.55
POSITIVE LOGITS
aloud
0.88
fulness
0.87
cience
0.85
ileaks
0.84
ynthesis
0.79
mith
0.79
umar
0.77
thoughts
0.77
warts
0.75
matter
0.73
Activations Density 0.017%