INDEX
Explanations
proper nouns starting with an apostrophe
various forms of contractions or possessive forms
New Auto-Interp
Negative Logits
rematch
-0.67
cycle
-0.67
eclipse
-0.65
rador
-0.63
contrast
-0.61
conservancy
-0.61
handc
-0.60
transpl
-0.60
isphere
-0.59
coupled
-0.59
POSITIVE LOGITS
Allah
0.91
atri
0.87
Cause
0.87
Mech
0.85
MIC
0.82
Angelo
0.82
Brien
0.81
nai
0.75
arak
0.75
Malley
0.74
Activations Density 0.044%