INDEX
Explanations
words related to locations or organizations
words related to accountability
New Auto-Interp
Negative Logits
JV
-0.80
hua
-0.73
PB
-0.72
hyde
-0.67
SG
-0.66
LM
-0.65
PLA
-0.65
geist
-0.65
knife
-0.64
ben
-0.64
POSITIVE LOGITS
uracy
1.23
redited
1.11
identally
1.00
urate
0.99
uzz
0.99
inations
0.95
idental
0.94
ustomed
0.92
acies
0.91
urrency
0.90
Activations Density 0.011%