INDEX
Explanations
phrases related to laws, regulations, and official actions
statements about capabilities or the state of being
New Auto-Interp
Negative Logits
targ
-0.62
)|
-0.61
Meridian
-0.57
hatch
-0.57
Swed
-0.57
derby
-0.55
trophy
-0.54
retrie
-0.54
sweats
-0.54
",
-0.54
POSITIVE LOGITS
ĸļ
0.80
yip
0.75
laun
0.72
xtap
0.72
hap
0.70
hement
0.69
ŃĶ
0.66
ARA
0.65
nonetheless
0.64
etheless
0.63
Activations Density 0.336%