INDEX
Explanations
phrases related to actions or efforts made by individuals or groups towards a specific goal or cause
New Auto-Interp
Negative Logits
ÃŁ
-0.80
appa
-0.75
CLA
-0.73
Developer
-0.72
larg
-0.71
SHIP
-0.70
DA
-0.66
Administ
-0.66
Laur
-0.66
Delete
-0.65
POSITIVE LOGITS
overboard
1.28
tant
1.09
towel
1.01
grenades
0.99
grenade
0.86
punches
0.86
wrench
0.79
thrown
0.76
curve
0.74
contrace
0.73
Activations Density 0.086%