INDEX
Explanations
names or partial names of individuals
specific names or identifiers
New Auto-Interp
Negative Logits
bureaucr
-0.77
corridors
-0.76
acron
-0.72
TRA
-0.72
prost
-0.70
shoulder
-0.69
multim
-0.68
rift
-0.67
rain
-0.67
HOT
-0.66
POSITIVE LOGITS
ney
1.03
rex
0.99
eware
0.99
sted
0.99
ault
0.99
atron
0.98
ember
0.98
yon
0.98
artz
0.97
bach
0.97
Activations Density 0.157%