INDEX
Explanations
references to specific entities or important subjects in a discussion
New Auto-Interp
Negative Logits
âĢº
-0.78
--+
-0.76
thood
-0.75
etooth
-0.72
contained
-0.68
ustom
-0.67
LOD
-0.66
.","
-0.65
bg
-0.65
pection
-0.65
POSITIVE LOGITS
resa
1.20
irony
1.17
implication
1.12
oret
1.10
truth
1.09
slightest
1.08
facts
1.04
hypocrisy
1.02
Economist
1.02
fact
1.01
Activations Density 0.404%