INDEX
Explanations
phrases indicating knowledge or awareness
references to knowledge or awareness
New Auto-Interp
Negative Logits
inqu
-0.79
phant
-0.74
ãĤ´ãĥ³
-0.74
vik
-0.73
onding
-0.65
cohol
-0.64
verse
-0.64
sidx
-0.64
viks
-0.64
reau
-0.64
POSITIVE LOGITS
firsthand
1.25
how
1.19
instinctively
1.12
exactly
1.11
better
1.10
what
0.97
best
0.96
intimately
0.96
perfectly
0.91
how
0.89
Activations Density 0.088%