INDEX
Explanations
phrases related to actions or behaviors
instances of living conditions and societal issues
New Auto-Interp
Negative Logits
staking
-0.76
ģĸ
-0.60
oria
-0.60
testament
-0.55
gradation
-0.53
pson
-0.52
realization
-0.52
adel
-0.52
Found
-0.52
eline
-0.52
POSITIVE LOGITS
differently
1.27
incorrectly
1.13
indoors
1.00
improperly
1.00
outdoors
0.98
inappropriately
0.97
exclusively
0.93
solely
0.93
excessively
0.91
without
0.89
Activations Density 0.688%