INDEX
Explanations
phrases related to common practices or activities
references to specific practices or behaviors, particularly those that are controversial or criticized
New Auto-Interp
Negative Logits
gin
-0.78
itialized
-0.76
ilee
-0.73
juven
-0.71
aline
-0.71
éĹĺ
-0.69
raspberry
-0.67
gow
-0.67
worms
-0.66
onge
-0.65
POSITIVE LOGITS
ually
0.92
Practices
0.78
practiced
0.78
practices
0.77
uality
0.76
practice
0.74
pract
0.70
itual
0.68
practitioners
0.67
reating
0.66
Activations Density 0.023%