INDEX
Explanations
names or words related to actions of awakening
words related to mistakes and errors
New Auto-Interp
Negative Logits
heter
-0.71
mson
-0.69
brates
-0.63
actionDate
-0.62
posium
-0.62
rolog
-0.61
brate
-0.60
includ
-0.60
thor
-0.60
acious
-0.59
POSITIVE LOGITS
aken
0.87
iak
0.83
fold
0.80
alia
0.80
umber
0.80
ede
0.77
ication
0.74
unciation
0.72
atche
0.72
nered
0.72
Activations Density 0.015%