INDEX
Explanations
words and phrases related to specific languages and their dialects
terms related to specific languages and linguistic features
New Auto-Interp
Negative Logits
PB
-0.66
TW
-0.65
OCT
-0.65
SAN
-0.65
GD
-0.65
BU
-0.64
antiv
-0.63
WW
-0.63
COM
-0.62
DIRECT
-0.62
POSITIVE LOGITS
iasis
1.12
aneers
1.07
idation
1.07
eers
0.99
adies
0.97
arians
0.97
ouses
0.97
acist
0.97
oses
0.96
antes
0.96
Activations Density 0.210%