INDEX
Explanations
the word "alone" with a high level of activation
New Auto-Interp
Negative Logits
iple
-0.90
ELD
-0.81
UG
-0.72
aim
-0.70
file
-0.70
oline
-0.69
ils
-0.68
igr
-0.66
alien
-0.66
ager
-0.64
POSITIVE LOGITS
anything
0.84
acular
0.83
necessarily
0.77
comprehend
0.76
TAMADRA
0.73
outright
0.73
any
0.72
reap
0.72
anywhere
0.70
yours
0.70
Activations Density 0.038%