INDEX
Explanations
proper nouns or named entities within personal narratives
New Auto-Interp
Negative Logits
abal
-0.70
conclud
-0.67
consumed
-0.63
imately
-0.63
Liver
-0.62
edia
-0.61
icip
-0.60
iland
-0.59
reaches
-0.57
imedia
-0.56
POSITIVE LOGITS
bluff
1.29
ãĥ¼ãĥ³
0.83
hotline
0.77
EStream
0.72
Tes
0.71
Duty
0.69
shots
0.68
bullshit
0.68
icide
0.65
liar
0.65
Activations Density 1.032%