INDEX
Explanations
phrases indicating the source or origin of information or perspective
occurrences of the word "from."
New Auto-Interp
Negative Logits
ratulations
-0.84
hai
-0.72
merce
-0.71
irm
-0.71
isode
-0.69
quote
-0.67
mathemat
-0.66
aca
-0.66
BSD
-0.63
lied
-0.62
POSITIVE LOGITS
afar
1.55
whence
1.18
elsewhere
0.92
scratch
0.91
inside
0.90
anywhere
0.89
thence
0.88
somewhere
0.85
everywhere
0.84
abroad
0.83
Activations Density 0.168%