INDEX
Explanations
phrases related to actions and decisions
repeated phrases or statements indicating emphasis or agreement
New Auto-Interp
Negative Logits
arthed
-0.75
®
-0.74
âĦ¢:
-0.69
WAR
-0.67
è¦ļéĨĴ
-0.66
ivist
-0.62
Pg
-0.61
*:
-0.60
ailable
-0.58
idden
-0.58
POSITIVE LOGITS
..."
0.92
mathemat
0.84
[
0.82
gotta
0.78
.""
0.76
)."
0.76
competitiveness
0.74
cknow
0.73
entimes
0.73
ain
0.71
Activations Density 0.368%