INDEX
Explanations
It looks for statements indicating knowledge or information
statements of knowledge or factual assertions
New Auto-Interp
Negative Logits
pex
-0.87
oshenko
-0.75
cit
-0.73
cific
-0.67
cohol
-0.65
otti
-0.65
cus
-0.65
ksh
-0.64
onies
-0.64
rentice
-0.64
POSITIVE LOGITS
ourselves
0.85
how
0.73
ledge
0.71
ledged
0.71
plenty
0.71
ariat
0.68
hinges
0.66
lege
0.62
enough
0.62
nothing
0.62
Activations Density 0.090%