INDEX
Explanations
German language related to possibility
New Auto-Interp
Negative Logits
was
-2.48
was
-2.05
Was
-2.00
Was
-1.94
were
-1.93
WAS
-1.54
were
-1.46
Were
-1.45
Were
-1.42
wasn
-1.41
POSITIVE LOGITS
war
0.67
WAR
0.66
War
0.62
war
0.59
WAR
0.55
War
0.52
guerres
0.49
guerra
0.47
era
0.44
oamen
0.44
Activations Density 1.052%