INDEX
Explanations
phrases related to physical strength and capability
references to power in various contexts
New Auto-Interp
Negative Logits
Von
-0.84
romeda
-0.78
Debor
-0.74
eryl
-0.73
Bei
-0.73
Admission
-0.71
ALK
-0.71
Kitt
-0.69
Observ
-0.67
roit
-0.66
POSITIVE LOGITS
stroke
0.99
houses
0.96
outage
0.93
train
0.92
lifting
0.92
Reviewer
0.92
chair
0.85
plant
0.84
boats
0.84
grid
0.83
Activations Density 0.035%