INDEX
Explanations
phrases related to instructions or specific actions
New Auto-Interp
Negative Logits
hap
-0.82
uga
-0.79
iosyn
-0.77
OST
-0.74
uay
-0.71
ordable
-0.70
regular
-0.69
otos
-0.69
ulz
-0.68
obal
-0.68
POSITIVE LOGITS
ratio
1.24
syndrome
1.15
Ratio
1.07
ratios
1.03
clause
0.99
trope
0.95
mentality
0.95
fallacy
0.94
Syndrome
0.90
initiative
0.90
Activations Density 1.823%