INDEX
Explanations
geographical locations and proper nouns related to places, specifically in the context of news or reports
New Auto-Interp
Negative Logits
houſe
-0.77
ſelf
-0.76
ſmall
-0.75
purpoſe
-0.73
pleaſure
-0.71
ſta
-0.70
ſeveral
-0.70
ſtate
-0.69
AxisAlignment
-0.66
perſon
-0.66
POSITIVE LOGITS
locally
0.57
<<<<<<<<<<<<<<
0.57
בארץ
0.54
locally
0.54
today
0.53
<=",
0.53
енча
0.52
PerformLayout
0.51
iecie
0.49
estekak
0.49
Activations Density 1.012%