INDEX
Explanations
names of specific individuals
names of prominent individuals and locations
New Auto-Interp
Negative Logits
door
-0.82
Down
-0.69
Charge
-0.69
ibur
-0.68
neck
-0.68
ground
-0.67
cause
-0.66
Rush
-0.66
brother
-0.66
icular
-0.64
POSITIVE LOGITS
itives
0.88
rary
0.87
opol
0.78
ilities
0.75
erald
0.74
urate
0.73
ouls
0.71
olitan
0.69
opoulos
0.69
İ
0.68
Activations Density 0.045%