INDEX
Explanations
words related to importance and influence
terms related to the significance and effectiveness of roles or actions in various contexts
New Auto-Interp
Negative Logits
andals
-0.86
classes
-0.85
events
-0.82
Units
-0.82
codes
-0.80
Classes
-0.79
hops
-0.79
laws
-0.77
skirts
-0.76
facts
-0.76
POSITIVE LOGITS
foothold
1.02
role
0.92
impression
0.89
stance
0.88
berth
0.84
angle
0.82
burden
0.78
charge
0.77
dive
0.77
effort
0.77
Activations Density 0.229%