INDEX
Explanations
phrases indicating a high likelihood or possibility
statements expressing a perception or opinion
New Auto-Interp
Negative Logits
estern
-0.84
rouse
-0.75
iding
-0.75
rient
-0.73
venge
-0.73
atching
-0.71
cum
-0.70
orthern
-0.68
Industry
-0.67
learning
-0.66
POSITIVE LOGITS
rils
0.81
plaus
0.78
unlikely
0.78
contrad
0.78
ãĤ¨
0.75
uncertain
0.74
likely
0.74
doubtful
0.74
Magikarp
0.73
innocuous
0.73
Activations Density 0.044%