INDEX
Explanations
mentions of specific names, titles, or entities
references to specific organizations, food items, and individuals
New Auto-Interp
Negative Logits
ãĥ¥
-0.79
kees
-0.75
ocular
-0.68
£
-0.66
iltration
-0.64
allowance
-0.61
WAYS
-0.60
NHS
-0.60
ffen
-0.60
eral
-0.59
POSITIVE LOGITS
aic
0.89
y
0.88
ments
0.85
spe
0.79
Ģ
0.77
sonian
0.76
¯
0.76
teenth
0.75
pillar
0.75
ors
0.74
Activations Density 0.029%