INDEX
Explanations
words related to environmental issues and actions
references to location or relation to regions
New Auto-Interp
Negative Logits
misunder
-0.73
delegation
-0.69
welf
-0.69
mathemat
-0.65
Negro
-0.65
contrace
-0.65
civilian
-0.64
stump
-0.63
jog
-0.63
buggy
-0.62
POSITIVE LOGITS
ï¸ı
1.52
ï¸
1.03
ski
0.89
£
0.82
sure
0.82
worthiness
0.82
eric
0.82
âĶĢâĶĢ
0.81
âĻ
0.79
forcing
0.78
Activations Density 0.264%