INDEX
Explanations
names or entities with 'je' in them
names, specifically proper nouns
New Auto-Interp
Negative Logits
ingham
-0.83
moving
-0.75
prop
-0.74
iqu
-0.73
norm
-0.70
izoph
-0.70
ãĥ£
-0.69
iques
-0.68
markets
-0.68
runner
-0.66
POSITIVE LOGITS
lde
1.08
anwhile
1.03
lda
0.87
llan
0.86
gger
0.84
cki
0.80
ktop
0.78
ansen
0.78
chal
0.77
irs
0.77
Activations Density 0.037%