INDEX
Explanations
phrases related to leading or direction
phrases indicating causation or consequences
New Auto-Interp
Negative Logits
Mach
-0.68
ager
-0.62
issan
-0.62
ctic
-0.60
agen
-0.59
pton
-0.59
cube
-0.59
Tar
-0.59
afort
-0.59
rehensive
-0.57
POSITIVE LOGITS
better
0.88
Leading
0.80
ĸļ
0.78
lead
0.77
Īè
0.73
leads
0.71
ļéĨĴ
0.68
wcs
0.68
:{0.67
leading
0.66
Activations Density 0.011%