INDEX
Explanations
references to containers
references to containers
New Auto-Interp
Negative Logits
bole
-0.81
olulu
-0.72
tz
-0.72
choes
-0.71
ramid
-0.71
hov
-0.70
OTO
-0.68
itters
-0.68
ongyang
-0.67
ichita
-0.66
POSITIVE LOGITS
container
1.36
containers
1.30
ainers
1.27
Container
1.27
Container
0.95
vier
0.86
container
0.86
vessel
0.81
reef
0.78
iculture
0.77
Activations Density 0.006%