INDEX
Explanations
phrases indicating indirect suggestions or meanings
statements that suggest or indicate a certain conclusion or implication
New Auto-Interp
Negative Logits
mir
-0.74
HCR
-0.72
mar
-0.71
xxxx
-0.69
Reds
-0.68
ajo
-0.66
unker
-0.66
skill
-0.66
Chips
-0.66
sung
-0.65
POSITIVE LOGITS
imply
1.26
implies
1.15
implied
1.06
infer
0.94
implying
0.93
inferred
0.78
extrap
0.75
WARRANT
0.75
inference
0.75
contradict
0.73
Activations Density 0.009%