INDEX
Explanations
academic or research-related content
New Auto-Interp
Negative Logits
outes
-0.20
zones
-0.17
oute
-0.17
uppet
-0.16
myModal
-0.16
øy
-0.16
Zones
-0.15
Zone
-0.15
zones
-0.15
zone
-0.15
POSITIVE LOGITS
Humb
0.23
.uni
0.22
hab
0.22
Fra
0.20
TU
0.20
Fors
0.20
Cluster
0.19
Gutenberg
0.19
RW
0.19
Helm
0.18
Activations Density 0.068%