INDEX
Explanations
words related to physical discomfort or harm
the word "from" used in various contexts
New Auto-Interp
Negative Logits
ratulations
-0.76
sic
-0.70
ierrez
-0.69
sat
-0.69
iddles
-0.68
uri
-0.68
busters
-0.67
unes
-0.67
isode
-0.65
trump
-0.64
POSITIVE LOGITS
afar
1.57
whence
1.17
anywhere
0.90
thence
0.88
elsewhere
0.87
everywhere
0.87
inside
0.84
wherever
0.84
somewhere
0.83
across
0.81
Activations Density 0.155%