INDEX
Explanations
specific mentions or references in text
occurrences of the word "mentions."
New Auto-Interp
Negative Logits
sett
-0.85
orneys
-0.78
arine
-0.70
ridge
-0.70
padd
-0.68
squ
-0.68
iership
-0.68
otypes
-0.66
psons
-0.65
ascript
-0.64
POSITIVE LOGITS
mentions
1.17
mentioning
1.07
mention
0.92
ãĤ¼ãĤ¦ãĤ¹
0.84
ij士
0.78
marks
0.75
Vegeta
0.73
ãĤ®
0.72
è£ı
0.72
places
0.72
Activations Density 0.007%