INDEX
Explanations
phrases starting with "According" followed by information or attribution
statements that attribute information or claims to sources
New Auto-Interp
Negative Logits
DOWN
-0.62
godd
-0.58
heights
-0.53
âϦ
-0.53
eleph
-0.53
mathemat
-0.53
helicop
-0.52
swe
-0.51
gobl
-0.50
submar
-0.50
POSITIVE LOGITS
ly
1.14
to
0.95
edly
0.86
liest
0.82
gest
0.75
lly
0.74
itionally
0.68
translation
0.68
LY
0.67
ities
0.66
Activations Density 0.035%