INDEX
Explanations
references to nature, specifically related to bushes or forests
terms related to nature and environment, particularly focusing on forests and geographical locations
New Auto-Interp
Negative Logits
rors
-0.82
ation
-0.80
urrent
-0.79
oded
-0.79
sen
-0.75
oin
-0.73
obal
-0.71
oding
-0.69
icates
-0.69
ible
-0.68
POSITIVE LOGITS
ãĥ£
0.83
vernment
0.74
cules
0.72
ãĥ
0.71
女
0.68
PLE
0.65
facts
0.64
bom
0.63
onga
0.62
ãĤ§
0.62
Activations Density 0.216%