INDEX
Explanations
proper nouns or names
the mention of "Col" followed by numerical values or related identifiers
New Auto-Interp
Negative Logits
REM
-0.75
avorite
-0.69
slack
-0.67
andowski
-0.66
ucks
-0.65
fumble
-0.65
swing
-0.64
hift
-0.63
lambda
-0.63
tarian
-0.62
POSITIVE LOGITS
Col
3.72
Col
2.35
col
1.94
col
1.88
COL
1.79
COL
1.70
Colonel
1.52
Maj
1.50
Colon
1.46
Brig
1.32
Activations Density 0.022%