INDEX
Explanations
phrases indicating knowledge or awareness
phrases asserting common knowledge or consensus
New Auto-Interp
Negative Logits
pex
-0.88
ermanent
-0.78
erva
-0.76
osi
-0.73
cific
-0.73
rentice
-0.73
onial
-0.71
ĪĴ
-0.71
oshenko
-0.71
cohol
-0.70
POSITIVE LOGITS
ledge
0.89
ledged
0.87
how
0.84
beforehand
0.74
lege
0.71
ariat
0.71
nothing
0.69
л
0.68
nothing
0.67
why
0.66
Activations Density 0.076%