INDEX
Explanations
proper nouns
references to a specific individual or name
New Auto-Interp
Negative Logits
STD
-0.74
vt
-0.72
UAL
-0.69
EMA
-0.68
©
-0.67
diagn
-0.67
Turing
-0.65
preference
-0.64
bunker
-0.64
vets
-0.63
POSITIVE LOGITS
Ol
3.77
Ol
2.45
ol
1.86
OL
1.45
Ole
1.42
Olson
1.38
Osw
1.29
Oliver
1.28
Oss
1.25
Olsen
1.24
Activations Density 0.013%