INDEX
Explanations
references to geographical locations and currencies, particularly focusing on mentions of "NZ" (New Zealand) and related terms
references to New Zealand and its locations
New Auto-Interp
Negative Logits
itbart
-0.75
Initialized
-0.75
abet
-0.75
baugh
-0.73
mentation
-0.73
ible
-0.71
antly
-0.71
gue
-0.70
dy
-0.68
Magikarp
-0.68
POSITIVE LOGITS
Zealand
1.33
Auckland
1.10
NZ
0.98
uckland
0.93
XT
0.89
Kiw
0.83
Labour
0.82
NZ
0.82
Canterbury
0.78
Wellington
0.77
Activations Density 0.019%