INDEX
Explanations
phrases indicating significant concepts or ideas that underline essays and arguments
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.09
3:0.12
4:0.30
5:0.01
6:0.09
7:0.12
8:0.03
9:0.04
10:0.04
11:0.08
Negative Logits
ynes
-1.88
favorably
-1.47
ussen
-1.42
���
-1.39
ebin
-1.32
interacted
-1.32
hadn
-1.32
augh
-1.31
Nex
-1.31
UNCLASSIFIED
-1.29
POSITIVE LOGITS
mysteries
1.52
paradox
1.50
mosa
1.42
Infinite
1.39
blance
1.33
EEK
1.32
fatal
1.31
individuality
1.30
misery
1.30
carnage
1.29
Activations Density 0.001%