INDEX
Explanations
proper nouns and locations, specifically those related to sports and politics
New Auto-Interp
Negative Logits
favors
-0.58
Jinn
-0.56
scraps
-0.53
UFOs
-0.51
UFO
-0.51
Ved
-0.50
sher
-0.50
constitu
-0.49
Favor
-0.48
synopsis
-0.48
POSITIVE LOGITS
wagen
0.89
helm
0.78
ggle
0.76
gart
0.76
wart
0.75
leck
0.71
ilda
0.71
otyp
0.68
amac
0.67
lich
0.66
Activations Density 0.115%