INDEX
Explanations
proper nouns or names
occurrences of the word "is" and its variations
New Auto-Interp
Negative Logits
Reviewer
-0.76
Redux
-0.68
Redditor
-0.65
ysis
-0.63
Crescent
-0.62
orsi
-0.62
WARNING
-0.60
enment
-0.60
Revelations
-0.59
handc
-0.59
POSITIVE LOGITS
iber
0.85
oy
0.83
umi
0.83
etooth
0.81
enei
0.81
sel
0.81
elo
0.81
achu
0.80
nown
0.79
ako
0.78
Activations Density 0.103%