INDEX
Explanations
text related to website themes
references to various themes
New Auto-Interp
Negative Logits
rah
-0.83
lishes
-0.77
raham
-0.77
ahn
-0.75
chieve
-0.75
ards
-0.75
ollow
-0.73
entimes
-0.72
held
-0.71
irl
-0.69
POSITIVE LOGITS
Theme
0.80
theme
0.79
theme
0.73
themes
0.73
ography
0.73
park
0.70
Forest
0.70
Generator
0.67
parks
0.67
puter
0.65
Activations Density 0.008%