INDEX
Explanations
words related to praise or criticism
occurrences of a specific special character or symbol
New Auto-Interp
Negative Logits
reflex
-0.73
mounts
-0.71
metic
-0.69
cloning
-0.68
camer
-0.66
Patriarch
-0.66
apes
-0.65
conduc
-0.65
vulner
-0.65
Tags
-0.65
POSITIVE LOGITS
ever
1.03
uthor
1.02
null
0.99
reci
0.94
ternity
0.94
cffffcc
0.92
lean
0.89
ihad
0.88
vernment
0.88
sure
0.87
Activations Density 0.177%