INDEX
Explanations
phrases related to knowledge or awareness
New Auto-Interp
Negative Logits
otion
-0.79
isco
-0.75
phrine
-0.75
ItemTracker
-0.74
onding
-0.73
erva
-0.69
ermanent
-0.68
pex
-0.66
acco
-0.65
avored
-0.65
POSITIVE LOGITS
ledged
1.14
ledge
1.11
lege
0.98
firsthand
0.97
beforehand
0.96
how
0.92
instinctively
0.83
nothing
0.83
nothing
0.79
LED
0.79
Activations Density 0.464%