INDEX
Explanations
references to Germany and its related terms
New Auto-Interp
Negative Logits
Majefty
-1.14
Monfieur
-1.12
ſelf
-1.07
Diſ
-1.06
Anſ
-1.02
iſt
-1.01
Houſe
-1.00
estekak
-1.00
auffi
-0.99
itſelf
-0.98
POSITIVE LOGITS
να
0.73
German
0.60
0.58
Stu
0.57
да
0.56
高
0.54
Cash
0.51
Peter
0.50
German
0.49
turn
0.48
Activations Density 0.096%