INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
存于互联网档案馆
-0.53
fubject
-0.48
pleaſure
-0.45
niad
-0.44
<bos>
-0.44
COMMENT
-0.44
houſe
-0.43
ſtate
-0.43
Hobby
-0.43
Preference
-0.42
POSITIVE LOGITS
s
0.76
ulates
0.66
lishes
0.66
enters
0.65
ixes
0.63
ontes
0.63
tifies
0.63
loses
0.63
lizes
0.63
rens
0.62
Activations Density 0.000%
No Known Activations
This feature has no known activations.