INDEX
Explanations
action-related terms or verbs related to various activities
New Auto-Interp
Negative Logits
ifies
-0.78
iosis
-0.77
ngth
-0.77
idon
-0.76
overe
-0.74
uate
-0.72
uria
-0.72
fi
-0.71
uther
-0.71
adian
-0.71
POSITIVE LOGITS
mechanism
0.86
rod
0.78
point
0.76
ability
0.74
circle
0.73
spree
0.72
agent
0.71
Stones
0.71
force
0.71
grounds
0.70
Activations Density 0.190%